Picture for Zhengyang Tang

Zhengyang Tang

PhoneWorld: Scaling Phone-Use Agent Environments

Add code
May 28, 2026
Viaarxiv icon

The Missing Piece in Pre-trained Model Evaluation: Reward-Guided Decoding Unlocks Task-Oriented Behavior Without Parameter Updates

Add code
May 27, 2026
Viaarxiv icon

Do Phone-Use Agents Respect Your Privacy?

Add code
Apr 02, 2026
Viaarxiv icon

Kimi K2.5: Visual Agentic Intelligence

Add code
Feb 02, 2026
Viaarxiv icon

CoRT: Code-integrated Reasoning within Thinking

Add code
Jun 12, 2025
Viaarxiv icon

Learning from Peers in Reasoning Models

Add code
May 12, 2025
Viaarxiv icon

RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques

Add code
Jan 24, 2025
Figure 1 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Figure 2 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Figure 3 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Figure 4 for RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques
Viaarxiv icon

Enabling Scalable Oversight via Self-Evolving Critic

Add code
Jan 10, 2025
Viaarxiv icon

Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion

Add code
Dec 16, 2024
Figure 1 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 2 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 3 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 4 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Viaarxiv icon

Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models

Add code
Oct 10, 2024
Figure 1 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 2 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 3 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Figure 4 for Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Viaarxiv icon